Derive the unit lexer from registered symbols by nertzy · Pull Request #29 · minad/unit

nertzy · 2026-06-12T18:47:51Z

The tokenizer used a frozen character class (/[a-zA-Z_°'"][\w°'"]*/) to recognize unit symbols. Any registered symbol built from characters outside that class was silently dropped to a dimensionless value rather than rejected -- so Unit(1, "Ω") and Unit(1, "%") returned a unitless 1, and compatible_with? on them raised. The same latent bug affected ℃, π, the arcminute/arcsecond quotes, a.u., Å, and ℉.

Build the tokenizer and symbol matcher from the symbols actually registered in the system, plus an ASCII baseline, and memoize them. #load clears the memo, so a unit defined at runtime -- including one whose symbol uses a non-ASCII glyph -- becomes lexable without touching the lexer. A symbol the system has never registered still parses to a bare unit token, preserving prior behavior for unknown ASCII symbols.

Add a spec that round-trips every symbol in the default system, which is what surfaced the dropped glyphs above.

The tokenizer used a frozen character class (/[a-zA-Z_°'"][\w°'"]*/) to recognize unit symbols. Any registered symbol built from characters outside that class was silently dropped to a dimensionless value rather than rejected -- so Unit(1, "Ω") and Unit(1, "%") returned a unitless 1, and compatible_with? on them raised. The same latent bug affected ℃, π, the arcminute/arcsecond quotes, a.u., Å, and ℉. Build the tokenizer and symbol matcher from the symbols actually registered in the system, plus an ASCII baseline, and memoize them. #load clears the memo, so a unit defined at runtime -- including one whose symbol uses a non-ASCII glyph -- becomes lexable without touching the lexer. A symbol the system has never registered still parses to a bare unit token, preserving prior behavior for unknown ASCII symbols. Add a spec that round-trips every symbol in the default system, which is what surfaced the dropped glyphs above.

nertzy merged commit da76701 into minad:master Jun 12, 2026
4 checks passed

nertzy deleted the dynamic-lexer branch June 15, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derive the unit lexer from registered symbols#29

Derive the unit lexer from registered symbols#29
nertzy merged 1 commit into
minad:masterfrom
nertzy:dynamic-lexer

nertzy commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nertzy commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant